Lexically-Based Terminology Structuring
نویسندگان
چکیده
Terminology structuring has been the subject of much work in the context of terms extracted from corpora: given a set of terms, obtained from an existing resource or extracted from a corpus, it consists in identifying hierarchical (or other types of) relations between these terms. The present work aims at assessing the feasibility of such structuring by studying it on an existing hierarchically structured terminology. Our overall goal is to test various structuring methods proposed in the literature and to check how they fare on this task. The specific goal at the present stage of our work, which we report here, is focussed on lexical methods that match terms on the basis on their content words, taking morphological variants and synonyms into account. We describe experiments performed on the French version of the US National Library of Medicine MeSH thesaurus. We compare the lexically-induced relations with the original MeSH relations and measure recall and precision metrics, taking two different views on the task: relation recovery and term placement. This method proposes correct term placement for up to 26% of the MeSH concepts, and its precision can reach 58%. After this quantitative evaluation, we perform a qualitative, human analysis of the ‘new’ relations not present in the MeSH. This analysis shows, on the one hand, the limits of the lexical structuring method. On the other hand, it reveals some specific structuring choices and naming conventions made by the MeSH designers, and emphasizes ontological commitments that cannot be left to automatic structuring.
منابع مشابه
Lexically-Based Terminology Structuring: Some Inherent Limits
Terminology structuring has been the subject of much work in the context of terms extracted from corpora: given a set of terms, obtained from an existing resource or extracted from a corpus, identifying hierarchical (or other types of) relations between these terms. The present paper focusses on terminology structuring by lexical methods, which match terms on the basis on their content words, t...
متن کاملThe Construction of a Lexically Motivated Corpus | the Problem with Dening Lexical Units |
We are currently constructing a special language corpus, with particular emphasis on such lexically oriented studies and applications as terminology and information retrieval. In the paper, we focus on the problem with de ning the lexical units in running texts, which is essential for the construction of a lexically motivated corpus. Although we focus on Japanese in this paper, the problem disc...
متن کاملTools for Terminology Processing
Automatic terminology processing appeared 10 years ago when electronic corpora became widely available. Such processing may be statistically or linguistically based and produces terminology resources that can be used in a number of applications : indexing, information retrieval, technology watch, etc. We present the tools that have been developed in the IRIN Institute. They all take as input te...
متن کاملDesign And Implementation Of A Terminology-Based Literature Mining And Knowledge Structuring System
The purpose of the study is to develop an integrated knowledge management system for the domains of genome and nano-technology, in which terminology-based literature mining, knowledge acquisition, knowledge structuring, and knowledge retrieval are combined. The system supports integrating different databases (papers and patents, technologies and innovations) and retrieving different types of kn...
متن کاملA Lexico-Semantic Approach To The Structuring Of Terminology
This paper discusses a number of implications of using either a conceptual approach or a lexico-semantic approach to terminology structuring, especially for interpreting data supplied by corpora for the purpose of building specialized dictionaries. A simple example, i.e., program, will serve as a basis for showing how relationships between terms are captured in both approaches. My aim is to dem...
متن کامل